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DETAILED ACTION 

This is the initial response to the application filled on June 20, 2003. Claims 1-21 are 
pending and are considered below. 

Specification 

The disclosure is objected to because of the following informalities: Numerous 
grammatical and spelling errors exist. For example "the system 100 utilizes a variational 
approach is taken" on page 5 lines 13-14, and "build-in" on page 6 line 9. Also, 
"Kullback-Liebler" in misspelled, and should read "Kullback-Leibler". Examiner has 
provided examples, and not a complete listing of grammatical and spelling errors. 
Therefore applicant is encouraged review the remaining specification and correct any 
and all errors. 

Appropriate correction is required. 

Information Disclosure Statement 

The information disclosure statement filed December 8, 2003 fails to comply with 
37 CFR 1 .98(a)(2), which requires a legible copy of each cited foreign patent document; 
each non-patent literature publication or that portion which caused it to be listed; and all 
other information or that portion which caused it to be listed. It has been placed in the 
application file, but the information referred to therein has not been considered. The 
non-patent literature cited on the statement does not have the corresponding document 
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scanned into to IFW, therefore, unless noted in the PTO-892, those documents have 
not been considered. 

Claim Objections 

Claims 4,9,10,16 and 18 are objected to because of the following informalities: 
Claims 4 and 16 use the variable N which is not defined, claim 9 uses the variables A, I, 
and u, which are not defined; claim 10 uses B, and u which are not defined; and claim 
18 uses N which is not defined. 

Appropriate correction is required. 

Claim Rejections - 35 USC § 1 12 

The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 

Claim 10 is rejected under 35 U.S.C. 1 12, first paragraph, as failing to comply 
with the written description requirement. The claim(s) contains subject matter which 
was not described in the specification in such a way as to reasonably convey to one 
skilled in the relevant art that the inventor(s), at the time the application was filed, had 
possession of the claimed invention. Claim 10 uses the variable B s , however this 
variable is not defined in either the claims or the specification. The examiner interprets 



Application/Control Number: 10/600,798 Page 4 

Art Unit: 2626 

this value as corresponding to duration parameter(s). This interpretation is used 
throughout the remainder of the office action. 



Claim Rejections • 35 USC § 101 



35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of 
matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the 
conditions and requirements of this title. 

Claims 1- 20 are rejected under 35 U.S.C. 101 because the claimed invention is 
directed to non-statutory subject matter. 

Claim 1 recites a system which does fall within one of the statutory categories. 
However, claim 1 recites "an input component' and "a model component" which are 
software components within a larger computer program, as evident by the specification 
as well as claim 20. Therefore claim one is claiming an abstract idea, or functional 
descriptive material, and since no practical application is provided, the claim is directed 
towards non-statutory subject matter. 

Claims 12,14 and 17 are rejected for similar reasons, i.e. they claim functional 
descriptive material, as evident by the specification and claim 20. 

Claim 19 recites "a data packet transmitted between two or more computer 
components", however a data packet does not fall into one of the statutory categories. A 
data packet is merely information, or an abstract idea, and thus non-statutory. 
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Claim 20 recites "a computer readable medium storing computer executable 
components of a system" which does not meet the requirements set forth in The Interim 
Guidelines. A computer component, as defined by the specification, can be software, 
hardware, or a combination of the two. However, a computer readable medium storing 
hardware is not physically possible, nor does it meet the requirements as set for by the 
Interim Guidelines, which dictate the procedure for appropriately claiming software, and 
only software, on a computer readable medium. Thus, for the previous reasons claim 20 
is non-statutory. 

Independent claims 1, 12,14,17, 19 and 20 are all non-statutory, thus rendering 
all dependent claims non-statutory. 

Claim Rejections - 35 USC § 102 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

. A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

Claims 1-3,5,11-13,19-21 are rejected under 35 U.S.C. 102(b) as being 
anticipated by Hogden (6,052,662). 

As per claims 1 ,20 and 21 , Hogden discloses a system and computer readable medium 
(claim 1 ) that facilitates modeling unobserved speech dynamics comprising: 
• an input component that receives acoustic data (column 10 line 49); 
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• a model component that models speech based, at least in part, upon the acoustic 
data, the model component comprising model parameters which characterize 
aspects of the unobserved dynamics (pseudo-articulator positions) in speech 
articulation, and, which characterize a mapping relationship from the unobserved 
dynamic variables (pseudo-articulator positions) to observed speech acoustics, 
the model parameters modified based, at least in part, upon a variational learning 
technique (training, i.e. adjustment of PDF parameters), and a technique for 
decoding an underlying unobserved phone sequence of speech based, at least in 
part, upon a variational learning technique (column 5 lines 10-25 and column 8 
lines 5-9). 

As per claim 12, Hogden discloses a method that facilitates modeling speech dynamics 
comprising: 

• recovering speech from acoustic data based, at least in part, upon a speech 
model having at least two sets of parameters, a first set of parameters describing 
unobserved speech dynamics (pseudo-articulator positions) and a second set of 
parameters describing a relationship between the unobserved speech dynamic 
vector and an observed acoustic feature vector (column 5 lines 10-33, the 
continuity map provides a mapping between a variable and a map position, i.e. a 
sound type and it's articulation)] 
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• calculating a posterior distribution based on the above model parameters 
(column 5 lines 10-33, PDF); and, 

• modifying at least one of the model parameters based, at least in part, upon the 
calculated posterior distribution (column 8 lines 5-9, adjust parameters of the 
PDF). 

As per claim 1 9, Hogden discloses a data packet transmitted between two or more 
computer components that facilitates modeling of speech dynamics, the data packet 
comprising: data associated with recovered speech, the recovered speech being based, 
at least in part, a speech model based upon acoustic data and model parameters, and 
the model parameters including at least one articulation parameter and at least one 
duration parameter (column 20 lines 54-63 and column 6 lines 53-61, speech is 
encoded as a pseudo-articulator path, or position, the path including articulator position 
during a particular time (articulation and duration)). 

As per claim 2, Hogden discloses the system of claim 1, as well as modification of at 
least one of the model parameters being based upon a variational Expectation 
Maximization algorithm having an E-step and M-step (column 8 line 45 -51, a path that 
maximizes the conditional probability data is determined). 
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As per claim 3, Hogden discloses the system of claim 2, as well as modification of at 
least one of the model parameters being based, at least in part, upon a mixture of 
Gaussian (MOG) posteriors based on a variational technique (column 9 lines 14-17). 

As per claim 5, Hogden discloses the system of claim 2, as well as modification of at 
least one of the model parameters being based, at least in part, upon a mixture of 
hidden Markov model (HMM) posteriors based on a variational technique (column 6 
lines 24-46). 

As per claim 1 1 , Hogden inherently discloses a speech recognition system employing 
the system of claim 1 (column 1 lines 24-26). 

As per claims 13, Hogden inherently discloses the method of claim 12 further 
comprising receiving the acoustic data (column 10 line 49). 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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Claims 6,17 and 18 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Hogden in view of McDonough (5,652,748). 

As per claim 6, Hogden discloses the system of claim 1 , the model component 
selecting an approximate posterior distribution relating to the acoustic data (column 5 
lines 10-33), but does not disclose optimizing a posterior distribution by minimizing a 
Kullback-Leibler (KB) distance thereof to an exact posterior distribution. McDonough 
discloses the use of the Kullback-Leibler distance to determine a likely sequence of 
words or phrases in a speech recognition system (column 1 1 lines 40-63). In addition, 
McDonough discloses that the Kullback-Leibler distance is known in the art, and one of 
many types of probability models used, any of which would produce an accurate and 
useful result. 

Therefore it would have been obvious to one of ordinary skill in the art at 
the time of the invention to modify at least one of the model parameters based, at least 
in part, upon the calculated approximated posterior distribution and minimization of a 
Kullback-Leibler distance of the approximation from an exact posterior distribution in 
Hogden, since the Kullback-Leibler distance is one of many probability models 
commonly used, therefore enabling the use a readily available software products or 
algorithms designed for its use. 
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As per claim 17, Hogden discloses a method that facilitates modeling speech dynamics 
comprising: 

• recovering speech from acoustic data based, at least in part, upon a speech 
model (column 6 lines 24-46); 

• calculating an approximation of a posterior distribution based on model 
parameters, the model parameters and the approximation based upon a hidden 
Markov model posteriors (column 6 lines 24-46); 

However, Hogden does not disclose modifying at least one of the model parameters 
based, at least in part, upon the calculated approximated posterior distribution and 
minimization of a Kullback-Leibler distance of the approximation from an exact posterior 
distribution. McDonough discloses the use of the Kullback-Leibler distance is used to 
determine a likely sequence of words or phrases in a speech recognition system 
(column 1 1 lines 40-63). In addition, McDonough discloses that the Kullback-Leibler 
distance is known in the art, and one of many types of probability models used, any of 
which would produce an accurate and useful result. 

Therefore it would have been obvious to one of ordinary skill in the art at 
the time of the invention to modifying at least one of the model parameters based, at 
least in part, upon the calculated approximated posterior distribution and minimization of 
a Kullback-Leibler distance of the approximation from an exact posterior distribution in 
Hogden, since the Kullback-Leibler distance is one of many probability models 
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commonly used, therefore enabling the use of readily available software products or 
algorithms designed for its use. 

As per claim 18, Hogden in view of McDonough disclose the method of claim 17, and 
Hogden further discloses calculation of the approximation of the posterior distribution 
being based, at least in part, upon: the recited equation , where x is a state of the 
model, s is a phone index, n is a frame number, and, q is a posterior probability 
approximation . (column 9 lines 27-28, Equation 2, which for conditional independence 
among frames, becomes the same function as the equation in the instant application). 

Claims 4 and 7-10 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Hogden in view of Ghahramani ("Variational Learning for Switching State-Space 
Models" Ghahramani et al., Neural Computation 2000). 

As per claim 4, Hogden discloses the system of claim 3, but does not disclose the 
model component being based, at least in part, upon: the recited equation. 
Ghahramani discloses the use of a probability approximation equation comprising a 
product or probabilities (page 7, Section 3: The Generative Model, equation 2). The 
equation of the instant application is the standard joint probability equation, modified for 
independent frames to produce a product of probabilities. 

Therefore it would have been obvious to one of ordinary skill in the art at 
the time of the invention to use the equation, as noted previously, in Hogden, since it is 
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an established formula used within the statistics discipline, therefore enabling the use of 
readily available software products or algorithms designed for its use. 

As per claim 7 and 8, Hogden discloses the system of claim 1 , as well as the model 
component being based, at least in part, upon a hidden dynamic model (Abstract, 
probabilistic mapping between speech sounds and articulator positions) but does not 
disclose the model being in the form of segmental switching state space model. 
Ghahramani discloses a probabilistic time-series model in the form of a segmental 
switching state space model (page 7 section 3: The Generative Model). In addition, 
Ghahramani discloses that the switching state space model can be used in a wide 
range of disciplines, including signal processing. The speech recognition discipline is a 
subset of signal processing therefore Ghahramani suggests that these models can be 
implemented as speech recognition models. 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use a model in the form of a segmental state space model in 
Hogden, since it can accurately represent dynamic phenomena, characterized by a 
combination of discrete and continuous dynamics, as indicated in Ghahramani 
(introduction), such as speech. 

As per claim 9 and 10, Hogden in view of Ghahramani disclose the system of claim 7, 
and Ghahramani further disclose the model component employing, at least in part, the 
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state equation: the recited equation, and probability distributions the recited equation 
(page 2, Section 2.1 State-space model, equations (5) and (3) and equation (1)). 

Therefore it would have been obvious to one of ordinary skill in the art to use the 
equation, as noted previously, in Hogden, since it would accurately model the input and 
output behavior of a system, i.e. the conditional probability of an output given a specific 
input, as indicated in Ghahramani (page 2 section 2.1 State-space models). 

Claims 14-16 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Hogden in view of Ghahramani, and further in view of McDonough 

As per claim 14, Hogden disclose a method that facilitates modeling speech dynamics 
comprising: calculating an approximation of a posterior distribution based on model 
parameters, the model parameters and the approximation based upon a mixture of 
Gaussians (column 9 lines 14-17). However, Hogden does not disclose recovering 
speech from acoustic data based, at least in part, upon a speech model in the form of 
segmental switching state space model and, modifying at least one of the model 
parameters based, at least in part, upon the calculated approximated posterior 
distribution and minimization of a Kullback-Leibler distance of the approximation from an 
exact posterior distribution. 

Ghahramani discloses recovering speech from acoustic data based, at least in 
part, upon a speech model in the form of segmental switching state space model <(page 
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7 section 3: The Generative Model). Ghahramani also discloses that the switching state 
space model can be used in a wide range of disciplines, including signal processing. 
The speech recognition discipline is a subset of signal processing therefore 
Ghahramani suggests that these models can be implemented as speech recognition 
models. In addition, McDonough discloses modifying at least one of the model 
parameters based, at least in part, upon the calculated approximated posterior 
distribution and minimization of a Kullback-Leibler distance of the approximation from an 
exact posterior distribution (column 1 1 lines 40-47). Hogden, Ghahramani and 
McDonough all disclose systems that model observations in relation to states, or 
hidden states, for the purpose of speech recognition. 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to recover speech from acoustic data based, at least in part, upon a 
speech model in the form of segmental switching state space model and, modifying at 
least one of the model parameters based, at least in part, upon the calculated 
approximated posterior distribution and minimization of a Kullback-Leibler distance of 
the approximation from an exact posterior distribution in Hogden, since a segmental 
switching state-space model can accurately represent dynamic phenomena, 
characterized by a combination of discrete and continuous dynamics, as indicated in 
Ghahramani (introduction), such as speech, and the Kullback-Leibler distance is one of 
many probability models commonly used, therefore enabling the use of readily available 
software products or algorithms designed for its use. 
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As per claim 15, Hogden \n view of Ghahramani further in view of McDonough 
disclose the method of claim 14, and Hogden further discloses receiving the acoustic 
data (column 10 line 49). 

As per claim 16, Hogden in view of Ghahramani further in view of McDonough 
disclose the method of claim 14, and Ghahramani further discloses calculation of the 
approximation of the posterior distribution being based, at least in part, upon: (see 
equation claim 16) (page 7, Section 3: The Generative Model). Ghahramani discloses 
the use of a probability approximation equation comprising a product or probabilities 
(page 7, Section 3: The Generative Model). In addition, the equation of the instant 
application is the standard joint probability equation, modified for independent frames to 
produce a product of probabilities. The joint probability equation has been used in the 
discipline of statistics for many years, and is an established and well known equation. 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use the equation, as noted previously, in Hogden, since it is an 
established formula used within the statistics discipline which is an effective way to 
determine the chances of two events occurring at the same time. 
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Conclusion 

The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

• Papcum (5,440,661 ) discloses a system which uses inferred articulatory 
movements as part of speech recognition. 

• Picone ("Initial Evaluation of Hidden Dynamic Models on Conversational 
Speech" Picone et al, IEEE 1999) discloses the use of Hidden Dynamic 
Models for the inference of targets in a hidden feature space. 

• Richards ("Vocal Tract Shape Trajectory Estimation Using MLP Analysis- 
by-synthesis" IEEE 1997) discloses a system that uses acoustic speech 
signals to infer vocal tract shape trajectories. 

• Ma ("A Mixture Linear Model with Target-Directed Dynamics for 
Spontaneous Speech Recognition" IEEE 2002) disclose a MLDM model 
which is used to represent VTR behavior, and map the VTR to acoustic 
representations. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Dorothy Sarah Siedler whose telephone number is 571- 
270-1067. The examiner can normally be reached on Mon-Thur 9:30am-5:30pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone 
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number for the organization where this application or proceeding is assigned is 571- 
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